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The study of fingerprint individuality aims to determine to what 
extent a fingerprint uniquely identifies an individual. Recent court 
cases have highlighted the need for measures of fingerprint indi- 
viduality when a person is identified based on fingerprint evidence. 
The main challenge in studies of fingerprint individuality is to ade- 
quately capture the variability of fingerprint features in a population. 
In this paper hierarchical mixture models are introduced to infer the 
extent of individualization. Hierarchical mixtures utilize complemen- 
tary aspects of mixtures at different levels of the hierarchy. At the 
first (top) level, a mixture is used to represent homogeneous groups 
of fingerprints in the population, whereas at the second level, nested 
mixtures are used as flexible representations of distributions of fea- 
tures from each fingerprint. Inference for hierarchical mixtures is more 
challenging since the number of unknown mixture components arise 
in both the first and second levels of the hierarchy. A Bayesian ap- 
proach based on reversible jump Markov chain Monte Carlo method- 
ology is developed for the inference of all unknown parameters of 
hierarchical mixtures. The methodology is illustrated on fingerprint 
images from the NIST database and is used to make inference on 
fingerprint individuality estimates from this population. 

1. Introduction. Recent court cases have highlighted the need for report- 
ing error rates when an individual is identified based on forensic evidence 
such as fingerprints. In the case of Daubert v. Merrell Dow Pharmaceuticals 
[Daubert v. Merrell Dow Pharmaceuticals Inc. (1993)], the U.S. Supreme 
Court ruled that in order for expert forensic testimony to be allowed in 
courts, it had to be subject to five main criteria of scientific validation, that 
is, whether (i) the particular technique or methodology has been subject to 
statistical hypothesis testing, (ii) its error rates have been established, (iii) 
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standards controlling the technique's operation exist and have been main- 
tained, (iv) it has been peer reviewed, and (v) it has a general widespread 
acceptance [see Pankanti, Prabhakar and Jain (2002) and Zhu, Dass and 
Jain (2007)]. Following Daubert, forensic evidence based on fingerprints was 
first challenged in the 1999 case of U.S. v. Byron C. Mitchell, stating that 
the fundamental premise for asserting the uniqueness of fingerprints had not 
been objectively tested and its potential matching error rates were unknown. 
Subsequently, fingerprint based identification has been challenged in more 
than 20 court cases in the United States. To address these concerns, several 
research investigations have proposed measures that characterize the extent 
of uniqueness of fingerprints (i.e., fingerprint individuality); see Pankanti, 
Prabhakar and Jain (2002), Zhu, Dass and Jain (2007) and the references 
therein. The primary aim of these measures is to capture the inherent vari- 
ability and uncertainty when an individual is identified based on fingerprint 
evidence. 

The statistical test of hypotheses for fingerprint based identification can 
be set up as follows: Consider an input fingerprint with an unknown identity 
It being compared to the fingerprint of a claimed identity I c . The test of 
hypotheses is 

(1.1) H :I t ^I c versus H 1 :I t =I c , 

where Hq (resp., Hi) is the hypothesis of a negative (resp., positive) identifi- 
cation. The hypotheses posed in the order of negative vs. positive identifica- 
tion (as opposed to the reverse order) allows us to control for the probability 
of making a false positive identification (i.e., the probability of Type I error). 
The test of Hq versus Hi in (1.1) is carried out by ascertaining the degree 
of similarity between the two prints and involves two important steps: First, 
salient fingerprint features are extracted from each print, and second, the 
collection of features of the two prints are "matched" with each other to 
obtain the best measure of similarity. 

Figure 1 illustrates an example of the feature extraction and matching pro- 
cedures described in the previous paragraph. Typical fingerprints as in Fig- 
ure 1 consist of smooth, nonintersecting flow patterns with alternating dark 
and light lines, called ridges and valleys, respectively. Occasionally, a ridge 
will either bifurcate or terminate and give rise to an anomaly. The anomalies 
in the ridge structures are called minutiae which are the fingerprint features 
used for identifying individuals. Figure 1 shows the locations of minutiae 
(x € R 2 ) as white squares for the two fingerprint images extracted using 
a pattern recognition algorithm described in Zhu, Dass and Jain (2007). 
Minutiae information of a fingerprint is easy to extract, permanent (does 
not change with time) and unique (distinct minutiae patterns for different 
individuals), making it a popular method for identifying individuals in the 



HIERARCHICAL MIXTURES 



3 



forensics community. Subsequently, the number of matches is determined 
by an optimal rigid transformation that brings the two sets of minutiae as 
close to each other as possible and counting the number of minutiae in the 
right panel that falls within a square of area 4tq centered at each minutiae 
in the left panel; vq is a small prespecified number relative to the size of the 
fingerprint image. A higher number of matches indicates a higher degree of 
similarity and favors the rejection of Hq in (1.1). 

The number of matching minutiae in Figure 1 is 25 but the question is: 
Should Hq be rejected? In that case, what is the uncertainty or error associ- 
ated with the decision? This is precisely the issue of fingerprint individuality 
since error rates associated with the observed match are unknown. Pankanti, 
Prabhakar and Jain (2002) and Zhu, Dass and Jain (2007) propose using the 
probability of a random correspondence (PRC) as a measure of fingerprint 
individuality. Mathematically, the PRC is expressed as 

(1.2) PRC(w\m,n) = P(S>w\m,n), 

where the random variable S denotes the number of minutiae matches, w 
denotes the observed number of matches, and m and n, respectively, are the 
number of minutiae in the two fingerprint images. The probability in (1.2) 
is calculated assuming Hq is true, that is, the pair of prints are impostors 
coming from two different individuals. Small (resp., large) values of the PRC 
indicate low (resp., high) levels of uncertainty which correspond to high 
(resp., low) extent of fingerprint individualization. A PRC of 0.0004, for 
example, indicates that only 4 out of 10,000 impostor matches will result in 
matching numbers that are greater than or equal to w. So, having observed 
w causes us to suspect that Hq may not be true. The uncertainty associated 
with this suspicion decreases as the PRC gets smaller (i.e., closer to 0). 




Fig. 1. Illustrating minutiae matching [taken from Pankanti, Prabhakar and Jain 
(2002)]. A total of m — 64 and n = 65 minutiae were detected in left and right image, 
respectively, and 25 correspondences (i.e., matches) were found. The white squares and 
lines, respectively, represent the minutiae location and the direction of ridge flow at that 
minutiae. 
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The connection between the PRC and the hypothesis testing criteria in 
Daubert (which is one of the five main criteria for the scientific validation 
of forensic evidence) can be seen as follows: Under the hypotheses testing 
of (1.1), the PRC is the p-value, computed under Hq, corresponding to the 
observed number of matches w. 

The value of the PRC depends on the distribution of minutiae locations 
in a pair of prints. Zhu, Dass and Jain (2007) demonstrated that when m 
and n are large, the distribution of S in (1.2) can be approximated by a 
Poisson distribution with mean (expected) number of matches 

(1.3) X(q 1 ,q 2 ,m,n) = mnp(q 1 ,q 2 ), 

where qt l ,h = 1,2 are the distributions fitted to the minutiae locations in the 
pair of prints, and p(q±,q 2 ) is the probability of a match given by 

(1.4) v{QX,Q2)= I / qi(x)q 2 (y)dxdy, 

where x € R 2 and y £ R 2 are independent minutiae from q\ and q 2 , respec- 
tively, and S(y,ro) is the square of area 4rg centered at y. 

The reliability of the PRC computed from a sample of fingerprints depends 
on (1) how well elicited statistical models fit the distribution of minutiae for 
different fingerprints, and (2) whether the sample is representative of the 
target population. The aim in this paper is to develop methodology for (1) 
while implicitly assuming the validity of (2). Thus, the results in Section 5 are 
valid for a population which has the fingerprint database as a representative 
sample. 

It is well known, for example, that the distribution of minutiae locations 
in fingerprints tend to form clusters [see, e.g., Scolve (1979), Stoney and 
Thornton (1986) and Zhu, Dass and Jain (2007)]. Thus, candidate statistical 
models have to meet two important requirements: (i) flexibility, that is, 
the model can represent a variety of minutiae distributions for different 
fingerprints, and (ii) associated measures of fingerprint individuality can be 
easily obtained from these models. These considerations led Zhu, Dass and 
Jain (2007) to propose mixture distributions as candidate choices for q\ and 
q 2 . Based on mixtures of independent normals, the analytical expression for 
p(qi,Q2) in (1-6) becomes 

K x K 2 2 

(1-5) P(qi, ft) = 4r 2 £ £ II fc(0| (/$ " A ^tlf + ^tlfl 
fc=l fc'=16=l V " ' S *~ ' 

fjL a 

where q h (x) = Efii IlLi M* (b) \$h> ^kl?) for h = ^ * = (z ( V {2) ) 
and (pi(-\fi, a 2 ) is the normal density with mean /j, and variance a 2 . 
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One drawback of Zhu, Dass and Jain (2007) is that no statistical model 
is elicited on the minutiae for a population of fingerprints; standard mix- 
ture distributions were proposed for minutiae distributions in each finger- 
print separately. As a result, no inference (e.g., confidence intervals) can 
be obtained for the population version of the PRC. This is our motivation 
for developing hierarchical mixture models and related inferential tools in 
this paper. The hierarchical mixture model [see (2.1)] is a model on the 
minutiae for a population of fingerprints that satisfies both requirements 
of (i) flexibility and (ii) computational ease mentioned earlier. We assume 
that the fingerprint population consists of G homogeneous groups with re- 
spect to the distribution of minutiae, with q g and w g , respectively, denoting 
the distribution of minutiae locations and population proportion of the gth 
sub-population, g = 1, 2, . . . , G. For a fingerprint pair coming from the sub- 
populations g\ and 52 with 1 < gi, gi < G, we have gi = q gi and qi = q g2 in 
(1.3). Hence, it follows that the population mean PRC corresponding to w 
observed matches in the population is given by 

G G 

(1.6) PRC(w\m,n) = ^ ^ uj gi uj g2 P(S > w\X(q gi ,q g2 ,m,n)), 

91=132=1 

where S follows a Poisson distribution with mean X(q gi ,q g2 ,m,n). 

In this paper a Bayesian framework for the inference from hierarchical 
mixture models is developed, which in turn can be used to make inference 
for the population mean PRC in (1.6). Hierarchical mixture models contain 
an unknown number of mixture components at two levels. Green (1995) and 
Green and Richardson (1997) developed the reversible jump Markov chain 
Monte Carlo (RJMCMC) approach for estimating the unknown number of 
mixture components by exploring the space of models of varying dimensions. 
The RJMCMC procedure developed in this paper generalizes the work of 
Green and Richardson (1997) to hierarchical mixture models with two levels 
of hierarchy. The rest of the paper is organized as follows: Section 2 devel- 
ops hierarchical mixture models for a heterogeneous population of objects 
(the objects are fingerprints in our application). Sections 3 and 4 develop 
the Bayesian and RJMCMC framework for inference from hierarchical mix- 
ture models. Section 5 discusses the application to fingerprint analysis using 
PRCs. 

2. Hierarchical mixture models. Consider an object, O, selected at ran- 
dom from a heterogenous population, V, with G (unknown) groups. Let X = 

(xi,X2,x%, . . .) denote the observables on O where xj = {x^~\xy , . . . 

is a (i-variate random vector in R d . A hierarchical mixture model for the 
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distribution of O in the population is 

G n 

(2.1) q(x)=^2u) g Y[q g (x j ), 

9=1 j=l 

where x = (x±,X2, ■ ■ ■ , x n ) are the n observations made on O, co g ,g = 1, 2, . . . , G 

are the G cluster proportions with u g > and Ylg^^g = 1> %(') * s the mix- 
ture density for the 5th cluster given by 

( 2 - 2 ) = ^^ fc f^ fc f( x l Afc 9)' 

fc=i 

with fk g denoting a density with respect to the Lebesgue measure on R d , p kg 
denoting the mixing probabilities satisfying: (1) p^g > and (2) Yl!k=iPkg = 
1, and \k g denoting the set of all unknown parameters in fkg- Identifiability 
of the hierarchical mixture model of (2.1) with respect to its components is 
achieved by imposing the constraints 

(2.3) U}\ < 0J2 < ■ ■ ■ < ojg and \\ g -< \ 2g -<■■■< \K g g 

for each g = 1, 2, . . . , G, where -< is a partial ordering to be defined later. 
The set of all unknown parameters in the hierarchical mixture model (2.1) is 
denoted by 6 = (G, u>, K, p, A), where u = (u?i, W2, . . . ,wg)i K = (Ki,K<2, . . . , 
Kg), P = (Pkg,k = l,2,...,K g ,g = 1,2,...,G) and A = (X kg ,k = 1,2,..., 
K g ,g = l,2,...,G). 

Hierarchical mixture models consists of two levels of hierarchy: At the 
first (top, or G) level, the mixture is used to represent the groups, whereas 
at the second (or Kg) level, nested mixture models (nested within each 
g = 1, 2, . . . , G specification) are used as a flexible representation of the dis- 
tribution of observables. The unknown number of mixture components, or 
mixture complexity, arise at both levels of the hierarchy, and is, therefore, 
more challenging to estimate compared to standard mixtures. Estimating 
mixture complexity has been the focus of intense research for many years, 
resulting in various estimation methodologies in a broad application domain. 
Nonparametric methods were developed in Escobar and West (1995) and 
Roeder and Wasserman (1997), whereas Ishwaran, James and Sun (2001) 
and Woo and Sriram (2007) developed methodology for the robust estima- 
tion of mixture complexity for count data. As discussed earlier, our approach 
for estimating mixture complexity will be Bayesian based on the RJMCMC 
algorithm. 

In the subsequent text we assume each f^ g is multivariate normal with 
mean vector ^ kg = (//jy 1 , fjj^ , . . . ,fJ^)' S R d and covariance matrix J2kg e 
R d x R d . 

Our analysis on the fingerprint images in the NIST database (see Section 
5) reveal that it is adequate to consider diagonal covariance matrices of the 
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form J2k 9 = dia g(( cr fcp) 2 '( <T fcg) 2 '---'( cr ig ) ) 2 )' where (^kgl is the variance 
of the bth component. Four different choices of the covariance matrix ^2f, g 

are considered, namely, diagonal covariance matrix with (i) common entries 
over k [i.e., = a g d , for some common value of <J g ], (ii) different entries 
over k, unrestricted covariance matrix with (iii) common entries over k, 
and (iv) different entries over k. These four choices are evaluated using the 
Bayes Information Criteria (BIC) which is a model selection criteria that 
favors parsimonious models consistent with the observed data. The highest 
BIC was found for the choice of diagonal covariance matrix of (i) or (ii) for 
almost all of the fingerprints in the NIST database; see Table 1. 
Thus, we take the density fk g in (2.2) to be 

d 

(2.4) / fcg (x|A fc5 ) = ^(x|/, fc9 , CT% ) = n^i(^ ) l4^(4 6 9 ) ) 2 )> 

b=l 

where 0i (■ <t 2 ) denotes the density of the univariate normal distribution 
with mean fi and variance a 2 , and cr kg = ((cj^) 2 , (cj^) 2 , . . . , (cj^) 2 )' is the 
ci-variate vector of the variances. The second identifiability condition of (2.3) 
is re-expressed in terms of the first component of the mean vector as 

(2-5) $<d2<-<l$. B - 

For N independent objects selected randomly from the population, it 
follows that the distribution of observables for the ith object, i = 1,2, ... ,N 
has the density 

G ni K g 

( 2 -6) q(x i )=^2uj g Y[^2p kg (j) d (xij\fj, kg ,(T k g), 

s=i i=ifc=i 

where = (xij,j = 1,2,..., nj) is the set of nt observations made on the ith 
object with each Xij € R d , for j = 1, 2, . . . , m. It follows from independence 
that the joint distribution of all observables, x = {x i ,i = 1,2,..., N), from 
N objects is given by Ili=i 9 Ci- 
table 1 

Covariance matrix selection: Entries give the number and percentages of fingerprint 
images in the NIST database that ranked each covariance model as the top choice based 

on BIC 



Covariance choice 


(i) 


(ii) 


(iii) 


(iv) 


Total 


Frequency 


1731 


238 





29 


1998 


Percentage 


86.64 


11.91 





1.45 


100.00 
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Two other notations are introduced here: /x and a will respectively denote 
the collection of all {fJ-k g , k = 1, 2, . . . , K g , g = 1,2, . . . , G} and {cr^g, k = l,2, 
. . . , Kg, g = 1,2,..., G} vectors. Our goal is to infer the unknown parameters 
= (G,u, K, p, fi, a) based on the observed data x. 

3. A Bayesian framework for inference. For the subsequent text, some 
additional notation is introduced. The symbol I{S) will denote the indica- 
tor function of the set <S, that is, I(S) = 1 if S is true, and 0, otherwise. 
The notation A,B, . . .\C,D, . . . will denote the distribution of random vari- 
ables A, B, . . . conditioned on C,D, . . . , with ir(A, B, . . . \C, D, . . .) denoting 
the specific form of the conditional distribution. Also, tt(A, B, . . . |-) will de- 
note the distribution of A, B, . . . given the rest of the parameters. We specify 
a joint prior distribution on in terms of the hierarchical specification 

(3.1) 7r(6>) = 7r(G,K)-7r(a;,p|G,K)-7r(ju|G,K)-7r((T|G,K). 

The component priors in (3.1) are as follows: 

(1) The prior on the mean vector is taken as 

G r / K 9 

KglUM^^T 2 ) 



7r(/x|K,G) = I] 

3=1 L \ k=l 

(3-2) x(/(^<^<...<^ 9 )) 



Miiii^aw 

\b=2k=l 

The indicator function appears due to the identifiability constraint (2.3) im- 
posed on fx with resulting normalizing constant K g \ for each g = 1, 2, . . . , G. 

(2) The prior distribution of the variances is taken as 

(3.3) 7t(<t|K, G) = f[ ( J] II IG((«7g) 2 |ao, A) 

3=1 \fe=16=l 

where IG denotes the inverse gamma distribution with prior shape and scale 
parameters ao and /3q, respectively. 

(3) The prior on the first and second level mixing proportions is taken 

as 

vr(cj,p|G,K) = G\D G {u\5u) ■ <u 2 < ■■■<u G ) 

(3.4) 

X Yl D K g (Pg\Sp), 

9=1 
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where Dh(-\5) denotes the .ff-dimensional Dirichlet density with the H- 
component baseline measure (5, S, ... ,5), where 5 is a prespecified constant, 
and p g = (pi g ,P2g, ■ ■ ■ ,PKg,g)'- The indicator function arises due to the im- 
posed identifiability constraint (2.3) on u. It follows that G7! is the appropri- 
ate normalizing constant for this constrained density, obtained by integrating 
out lj and noting that Dq(u)\6 u ) is invariant under different permutations 
of u>. 

(4) The prior on G? and K is taken as 



G 

(3.5) n(G,K) = vr(G) • n(K\G) = tt (G) • JJ n (K gJ) 

9=1 



where ttq is the discrete uniform distribution between G m i n and G max (resp., 
-f^min to K max ), both inclusive, for G (resp., K g ). 

The prior on 9 depends on the hyper-parameters 5 P , 8 U , G max , G mm , 

R rnin j ^max j 

Hq, t 2 , a and /3 , all of which need to be specified for a 
given application. The reader is referred to our technical report [Dass and 
Li (2008)] for these specifications. 

The likelihood of the hierarchical mixture model involves several sum- 
mations within each product term and is simplified by augmenting vari- 
ables to denote the class labels of the individual observations. Two different 
class labels are introduced for the two levels of hierarchy: (1) The aug- 
mented variable W = (W\, Wi, ■ ■ ■ , Wjv) denotes the class label of the G 
sub-populations, that is, Wi= g whenever object i arises from the gth sub- 
population, and (2) Z = (Zi, Z2, ■ ■ ■ , Zn) with Zi = (Zy,j = 1, 2, . . . , m), 
where Zij = k for 1 < k < K g if Xij arises from the kth mixture compo- 
nent ^d(-|/x fcg , cTkg). We denote the augmented parameter space by the same 
symbol 6 as before, that is, 6 = (G, u, K, p, /x, <r, W, Z). The augmented 
likelihood is now 

£(G, a;, K, p, /x, <r, W, Z) 

(3.6) 

N m G K g 

= n n uu^m^^y^^, 

i=\ j = lg=lk=l 

with priors on W and Z given by 

(3.7) vr(W,Z|G,K,cj,p) =7r(W|G,w) ■ tt(Z|G,K, W,p), 

where jr(W|G,w) = U^iU%iVg m=9) and 

G m K g 

vr(ziG,K,w, P) =n n niKf^ fc) 

g=li:W t =gj=lk=l 
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Based on the augmented likelihood and prior distributions, one can write 
down the posterior distribution (up to a normalizing constant) via Bayes 
theorem. The posterior has the expression 

vr(0|x) oc^(G,cj,K,p,/z,cr,W,Z) x 7r(W,Z|G,K,w,p) 

(3.8) 

x Tr(G,K,u,p,n,cr) 
based on (3.1), (3.6), (3.7) and observed data x. 

4. Posterior inference. The total number of unknown parameters in the 
hierarchical mixture model depends on the values G and K. Thus, the pos- 
terior in (3.8) can be viewed as a probability distribution on the space of all 
hierarchical mixture models with varying dimensions. To obtain posterior 
inference for such a space of models, Green (1995) and Green and Richard- 
son (1997) developed the RJMCMC for Bayesian inference. In this paper we 
develop a RJMCMC approach to explore the posterior distribution in (3.8) 
resulting from the hierarchical mixture model specification. We briefly dis- 
cuss the general RJMCMC implementation here. Let 9 and 9* be elements 
of the model space with possibly differing dimensions. The RJMCMC ap- 
proach proposes a move, say, m, with probability r m . The move m takes 9 
to 9* via the proposal distribution q m (0,9*). In order to maintain the time 
reversibility condition, we require to accept the proposal with probability 

(4.1) a{6,6 ) =min< 1, 



7r(0|x) r m q m (6,6* 

in (4.1), q m i(0*,9) represents the probability of moving from 6* to 9 based 
on the "reverse" move m', and ir(9\x.) denotes the posterior distribution of 
9 given x. It is crucial that the moves m and m! be reversible [see Green 
(1995)], meaning that the densities q m (9,9*) and q m >(9*,9) have the same 
support with respect to a dominating measure. In case 9* represents the 
higher dimensional model, we can first sample u from a proposal qo(9,u) 
(with possible dependence on 9), and then obtain 9* as a one-to-one function 
of (9,u). In that case, the proposal density q m (9,9*) in (4.1) is expressed 
as 

B9* 

(4.2) q m (0,9*) = qo(0,n)/det — — 

o(V,u) 

where g^ u ) denotes the Jacobian of the transformation from (9, u) to 
9*, and det represents the absolute value of its determinant. If the triplet 
(9, u, 9*) involves some discrete components, then the Jacobian of the trans- 
formation is obtained by the one-to-one map of the continuous parts of 9* 
and (9,u), which can depend on the values realized by the discrete compo- 
nents. 
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For the inference on hierarchical mixture models, five types of updating 
steps are considered with reversible pairs of moves, (m,m'), corresponding 
to moves in spaces of varying dimensions. The outline of the steps are as 
follows: 

(1) Update G with (m,mf) = (G-split, G-merge), 

(2) Update K|G,cj, W with (m,m') = (if-split, K-merge), 
(4.3) { (3) Update cj|G, K, W, Z, p, /x, <r, 

(4) Update W, Z|G, K,w,p, /x,<x and 

(5) Update p, <r|G, K, cj, W, Z. 

Our methodological contribution is the development of the Update G 
steps (G-split and G-merge) based on a pair of reversible jump moves. 
The steps for merging and splitting G are described in detail in the Appendix. 
The Update K steps are similar to that of Green and Richardson (1997). 
The other steps (3)— (5) do not involve jumps in spaces of varying dimen- 
sions, and can be carried out based on a regular Gibbs proposal. One cycle 
through steps (l)-(5) completes one iteration of the RJMCMC sampler. 

The assessment of convergence of the RJMCMC is carried out based on 
the methodology of Brooks and Guidici (1998, 2000). A total of 3 chains are 
run from different starting points and different variance components of the 
log-likelihood are calculated to obtain 3 diagnostic plots, namely, the plots 
of (i) the overall and within chain variance, V and W c , (ii) within model 
and within chain within model variances, W m and W m W c , and (hi) between 
model and between model within chain variances, B m and B m W c , against 
the number of iterations. The merging of the two lines in each plot indicate 
that the chains have sufficiently mixed. 

5. Assessing fingerprint individuality. Our inferential methodology for 
assessing fingerprint individuality is illustrated using fingerprint images from 
the NIST Special Database. The NIST fingerprint database is publicly avail- 
able and consists of 2000 8-bit gray scale fingerprint image pairs of size 512- 
by-512 pixels. Because of the similarity of the image pairs, only the first 
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image of each pair was used in the statistical modeling. The algorithm de- 
scribed in Zhu, Dass and Jain (2007) (also mentioned in the Introduction) 
was used to extract minutiae from these images; minutiae could not be au- 
tomatically extracted from two images of the NIST database due to poor 
quality and these were discarded from further consideration. Figure 2 shows 
examples of two fingerprint images from the NIST database with minutiae 
locations indicated by white squares. 

The RJMCMC algorithm developed in the previous section is used to 
obtain the posterior distribution of PRC. The first Nq = 100 fingerprint im- 
ages from the NIST database are taken as the sample and three chains with 
starting values obtained using the clustering procedure of Zhu, Dass and 
Jain (2007) are run. Figure 3 gives the diagnostic plots of the RJMCMC 
sampler which establish convergence after a burn- in of B = 250,000 itera- 
tions. The posterior distribution of PRC (corresponding to m = 64, n = 65, 
w = 25 and ro = 15 pixels) based on 1000 realizations of the RJMCMC af- 
ter the burn-in period is given in Figure 4 with a posterior mean of 0.6859 
and the 95% HPD interval given by [0.63,0.735]. We conclude that if a fin- 
gerprint pair was chosen from this population with m = 64, n = 65 and an 
observed number of matches w = 25, there is high uncertainty in making a 
positive identification. Our analysis actually indicates that the fingerprints 
in Figure 1 represent a typical impostor pair. The 95% HPD set suggests 
that the PRC can be as high as 0.735, that is, about 3 in every 4 impostor 
pairs result in 25 or more matches. 

How many matches does it take to positively identify an individual? Dif- 
ferent countries around the world have different standards [Girard (2007)]. 
In the Netherlands, this number is 12, whereas in South Africa, it is 7. In 
the United States and the UK, this number is not fixed and depends on 
expert testimonial. To assess the level of uncertainty associated with these 
standards, we conduct a study of the PRC based onro = 7 matches. The best 




x 10 5 x 10 5 x 10 5 



(a) (b) (c) 

Fig. 3. Convergence diagnostics for the NIST fingerprint database with No = 100. Panels 
(a), (b) and (c), respectively, show the plots of (V,W C ), (W m ,WmW c ) and (B m ,B m W c ) 
as a function of the iterations. The x-axis unit is 10,000 iterations. 
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Fig. 4. Posterior distribution o/PRC based on 1000 realizations of the RJMCMC after 
300,000 iterations for N = 100. 

case scenario corresponds to the combination (m,n,w) = (7,7,7) (when all 
query and template minutiae match with each other) with a mean PRC of 
5.09 x 10 -5 in Table 2. Note that 7 matches has moderate strength of evi- 
dence for declaring a positive match; the PRC implies 5 in 100,000 impostor 
fingerprint pairs will have all 7 minutiae match with each other. It is also 
very unlikely that n = 7 in real life since fingerprints lifted from a crime 
scene have far lesser number of minutiae (thus, m <C re) compared to the 
template it is being matched to. In this latter case, the PRCs are far larger 
(see Table 2), making the case for positive identification even weaker. 

To compare the results of inference using a larger sample size, we ran the 
RJMCMC sampler for the first Nq = 200 and 500 fingerprint images in the 
NIST database. The computational complexity increases in two ways: first, 
it takes longer, on the average, to complete one iteration of the RJMCMC 
and second, the RJMCMC takes a longer time to converge. On our personal 
computer with processing speed 2.66 GHz and 1.96 GB of RAM, it took 
about 12.5, 31.8 and 90.0 hours, respectively, to generate every 50,000 it- 
erations of the RJMCMC for N = 100, 200 and 500. While the RJMCMC 
converged at 300,000 iterations for Nq = 100, the chain did not converge 
even at B = 350,000 iterations for N Q = 200 (see Figure 5) and 7V = 500 
(the diagnostic plots are not shown). 

Discussion: The RJMCMC sampler is able to accommodate all Nq = 1998 
fingerprint images from the NIST database. However, this chain is extremely 
slow at mixing, and therefore, we do not expect convergence to occur in real 
time on our computers. Computational demand magnifies exponentially for 
very large databases such as the US- VISIT program. Thus, for implemen- 

Table 2 

Mean PRCs for the combinations (7, 7,n) 













1 



n 7 10 15 55 65 75 

Mean PRC 5.09 x 10" 5 1.40 x 10" 4 3.25 x 10 -4 0.0155 0.0333 0.0614 
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X 10 5 X 10 5 X 10 5 

(a) (b) (c) 

Fig. 5. Convergence diagnostics for the N I ST fingerprint database with No = 200. Panels 
(a), (b) and (c), respectively, show the plots of (V,W C ), (W m ,W m W c ) and (Bm^mWc) 
as a function of the iterations. The x-axis unit is 10,000 iterations. 



tation on very large databases, results can be obtained with the help of 
high-end computing facilities. 

One enormous advantage of the methodology outlined in this paper is 
that the RJMCMC needs to be run on large databases only once. After 
convergence is achieved, inference on PRC for any combination of (w,m,n) 
can be obtained using formula (1.6). As an illustration based on our smaller 
sample size of Nq = 100, Table 3 gives the results of this analysis for different 
combinations of (w,m,n). The entries of Table 3 provides a general guide- 
line to FBI and forensic experts on the extent of uncertainty associated with 
making a positive identification. Note that when (w,m,n) = (25,64,65), the 
PRC is high, indicating a low extent of individualization. However, Table 
3 also provides several combinations of (w,m,n) that favor positive identi- 
fication with a high degree of individualization. For example, we can look 
at entries in Table 3 for 95% HPD sets that fall entirely below a threshold, 
say, To. With the choice of To = 0.003 (that is, 3 in every 1000 impostor 
fingerprint pairs will have w or more observed matches), the combinations 
that allow for positive identification with uncertainty level of at most Tq 
are (45,54,55), (50,54,55), (53,54,55), (50,64,65) and (53,64,65); for these 
combinations, the probability that the true PRCs occur below Tq is at least 
95%. For larger values of No, the size of the HPD sets will decrease due to 
decreasing variability of the estimate of PRC. 

In this paper we only considered a two level hierarchical mixture model. 
The US- VISIT program now requires individuals to submit prints from all 
10 fingers. This is the case of a 3-level hierarchical mixture model; in the first 
(top) level, individuals form the G groups based on similar characteristics 
of their 10 fingers, and the distribution of features in each finger is mod- 
eled using standard mixtures. Any higher level hierarchical mixture models 
will be more involved in two ways: (1) The computational costs, including 
memory and time, since convergence will be much slower to achieve, and 
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Table 3 

Posterior means and 95% HPD sets calculated based on 1000 realizations 
of the RJMCMC for N = 100 





Mean 




HPD 






(m, 


n) = (54,55) 


25 


1.33 x 10 _1 




(1.16,1.53) x 10" 1 


35 


1.70 x 10~ 3 




(0.80,3.40) x 10 -3 


45 


5.69 x 10" 4 




(0.0007,2.40) x 10 -3 


50 


5.68 x 10~ 4 




(0.00002,2.40) x 10~ 3 


53 


5.68 x 10~ 4 




(0.0002,7.34) x 10~ 4 






(m, 


n) = (64,65) 


25 


6.84 x 10 _1 




(6.27,7.26) x lO" 1 


35 


9.22 x 10~ 2 




(0.77,1.09) x 10" 1 


45 


1.90 x 10 -3 




(0.93,3.60) x 10~ 3 


50 


6.42 x 10~ 4 




(0.0051,2.40) x 10~ 3 


53 


5.80 x 10~ 4 




(0.0089,2.40) x 10 -3 






(m, 


n) = (74,75) 


25 


9.50 x 10" 1 




(8.99,9.87) x 10" 1 


35 


6.10 x 10" 1 




(5.54,6.57) x 10" 1 


45 


9.91 x 10 -2 




(0.81,1.20) x 10" 1 


50 


2.12 x 10~ 2 




(1.63,2.73) x 10~ 2 


53 


7.10 x 10~ 3 




(5.10,9.80) x 10~ :i 



(2) the development of reversible moves such as G-merge and G-split for 
the higher level of mixtures. We are of the view that the best estimate of 
the population PRC can be obtained if the data is characterized by a model 
that best represents the way the data is structured and observed. In the case 
of the US- VISIT, the 3- level hierarchical mixture model is indeed the right 
way to view the available data. Further research will be needed to see how 
the computational complexity can be reduced. The availability of high-end 
computing facilities will definitely be a requirement for fitting higher level 
hierarchical mixtures. 

The central issue for extending the proposed analysis to other biometrics, 
such as face and iris, is the type of feature extracted for each of the differ- 
ent biometrics. The framework of hierarchical mixture models will apply to 
these biometric traits but we have to develop mixture models on different 
feature spaces. The features we used in this paper were minutiae locations, 
and therefore, we needed mixture models on points in R 2 . An additional 
feature for fingerprints are the minutiae directions (the white lines in Figure 
1). In order to run a similar analysis, one would need to develop suitable 
mixture models on the product space R 2 x [0,27r). Similarly, in the case 
of iris, the feature used is the IrisCode (consisting of a rectangular array 
of 0s and Is), and so the statistical models that have to be developed are 
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potentially Markov Random Field models (since there is significant spatial 
dependence between neighboring Os and Is) indexed by a set of parameters. 
Then, one could postulate that the population consists of G such groups of 
MRF models. We will also need a distribution for the number of matching 
features and derive the distribution of this under impostor pairs of IrisCodes. 

6. Summary and future work. We have developed Bayesian inference 
methodology for hierarchical mixture models with application to finger- 
print individuality. One way to further reduce the level of uncertainty for 
a fixed combination (w,m,n) is to increase the number of features used for 
matching. Our future work will be to derive hierarchical mixture models on 
the extended feature space consisting of minutiae locations and directions. 
The challenge here is that the angles are significantly spatially correlated and 
the minutiae locations exhibit clusters. We are currently developing a model 
that can account for these minutiae characteristics. We plan to improve our 
algorithm so that it can be run more quickly on very large databases. Hier- 
archical mixture models have potential use in other areas as well, including 
the clustering of soil samples (objects) based on soil characteristics which 
can be modeled by a mixture or a transformation of mixtures. 

APPENDIX 

In the subsequent text, the identifiability condition (2.5) based on the 
first components of [iy, g for k = 1,2, ... ,K g will be rewritten using the '-<' 
symbol as 

for each g = 1, 2, . . . , G. Let and 6* denote two different states of the model 
space, that is, 

= (G,o^,K,p,/i,cr, W,Z) and 

(A.2) 

6* = (G*,u*,K*,p*,n*,a*,W*,Z*), 
where the *s in (A.2) denote a possibly different setting of the parameters. 

A.l. The Gr-merge move. The G-merge move changes the current G to 
G—l (that is, G* = G — 1) and is carried out based on the following steps: 

Step 1: Two of the G components, say, g\ and g2, with g\ < g2, are se- 
lected randomly for merging into g* with u g * = oo gi + u g2 . 

Step 2: The ET-components, K gi and K g2 , are combined to obtain K g * in 
the following way. Adding K gi + K g2 = K t , we set K g * = (K t + l)/2 if K t is 
odd, and K g * = Kt/2 if K% is even. 
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Step 3: Next, (p ffl , fj, , a gi ) and (p g2 , /j, g2 , a g2 ) are merged to obtain (p s * , 
fj, g *,(T g *) as follows. The identifiability conditions of (A.l) hold for g = g\ 
and g = g2, and must be ensured to hold for g = g* after the merge step. To 
achieve this, the Kt fJ.s are arranged in increasing order 

(A.3) /Xj -< /x 2 -< < fi Kt _x -< H Kt 

with associated probability pj for [ij, for j = 1,2, . . . ,Kf. Thus, pj are a 
re- arrangement of the Kt probabilities in p Sl and p 92 according to the par- 
tial ordering on fi and fi g2 in (A.3). First, the case when Kt is even is 
considered. Adjacent n values in (A.3) are paired 

(A.4) n± -< /x 2 -< /x 3 -< /x 4 -< < n Kt _ x -< fi Kt 



and the corresponding g* parameters are obtained using the formulas p* k 

P2k-1+P2k 



y" 



(A.5) 



Ui.„» = and 

9 P2k-1+P2k 



P2k-\&2k-\ +P2k&2k 



r kg* 



P2k-X +P2k 

for k = 1, 2, ... , .fTg* . To obtain W* and Z*, objects with Wi = g\ or Wj = 52 
are relabeled as W* = g* . For these objects, the allocation to the K g * com- 
ponents is carried out using a Bayes allocation scheme. Explicit expressions 
for the allocation probabilities are provided in Dass and Li (2008). When Kt 
is odd, an index, iq is selected at random from the set of all odd integers up 
to K t , namely, {1,3,5, . . . ,K t }. The triplet (pi , /x io , cr io ) is not merged with 
any other indices but the new p* Q =pi /2. The remaining adjacent indices 
are merged according to Step 3. 

A. 2. The G-split move. The split move is reverse to the merge step 
above and is carried out in the following steps: 

Step 1: A candidate G-component for split, say, g, is chosen randomly 
with probability 1/G. The split components are denoted by g± and g 2 . The 
first level mixing probability, uj g , is split into LJ gi and uj g2 by generating 
a uniform random variable, uq, in [0, 1] and setting oj gi = uoui g and ui g2 = 
(1 - n )w 9 . 

Step 2: The value of K g is transformed to K where Kt is either 2K g — 1 
or 2K g with probability 1/2 each. Once Kt is determined, a pair of indices 
(K gi , K g2 ) is selected randomly from the set of all possible pairs of integers 
in {K min , K min + 1, . . . , if max} 2 satisfying K gi + K g2 =K t .li M is the total 
number of such pairs, then the probability of selecting one such pair is 
1/Mq. The selection of K gi and K g2 determines the number of second level 
components in the g\ and 52 groups. 
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Step 3: The aim now is to split each component of the triplet (p fl , fi g , cr g ) 
into 2 parts: (p ffl , £t fll , er fll ) and (p 92 , n g2 , <x S2 ) such that both and fi g2 
satisfy the constraints (A.l) for g = g\ and gi ■ The case of K gi + K g2 = 
2K g is first considered. A sketch of the split move is best described by the 
diagram in Figure 6, which introduces the additional variables to be used 
for performing the split. In Figure 6, 2p g is considered for splitting because 
the two split components will represent the second level mixing probabilities 
of gi and gi , the sum of which together equals 2. 

For each k, the variable u^ g in Figure 6 takes three values, namely, 0, 1 
and 2 that respectively determines if the split components of 2p^ g , ju fcg and 
<Jk g either (1) both go to component gi-, (2) one goes to component g\ and 
the other goes to gi, or (3) both go to g\. The variables Uk g , k = 1,2, ... , K g 
must satisfy several constraints: (1) ^fc=i u % = K gi , (2) Uk g = 1 for any k 
such that pk g > 0.5, and (3) J2k-u k =h^Pkg < 1 for /i = 0,2. The reader is 
referred to our technical report Dass and Li (2008) for further explanation 
of these restrictions. 

To generate the vector u = {u\ g ,U2 g , ■ ■ ■ ,UK g g)' , we consider all combina- 
tions of u € {0,1, 2}^, and reject the ones that do not satisfy the three 
restrictions. From the total number of remaining admissible combinations, 
Mi, say, we select a vector u randomly with equal probability \/M\. 

Once u has been generated, a random vector v = (vk g , k = 1, 2, ... , K g ) 
is generated to split 2p 9 (see Figure 6). Some notation are in order: Let 
Aq = {k : Uk g = 0}, A\ = {k : u^g = 1} and A2 = {k : Uk g = 2}. As in the case 
of u, a few restrictions also need to be placed on the vector v. To see what 



U\g U2g ■ ■ ■ Ukg ■ ■ ■ U K g g 

2 Pl 9 ■< 2 P2g -< -< 2Pk. g ■< -< 1VK 99 

J, J, \, J, 

Vlg V 2 g ■•• Vkg ••• V Kg ,g 

y\ s\ ••• /\ •■■ /\ 

(1) (2) (1) (2) (1) (2) (1) (2) 

Pig P\g P\g P\g — Pkg Pkg ~' P Kg g P Kg g 

Vlg M2 9 -> Mfc 9 -> y-K s g 

Vlg Vlg Vlg Vlg Vkg Vkg VK g g VK g g 

IS — > 2 g —>...—> Okg O Kg g 

/\ /\ ••• /\ ••• /\ 

Z\g Zlg Z 2 g Z 2 g ••• Z k g Z kg ■■■ Z Kg g Z Kg g 



Fig. 6. Splits o/2p 9 , fi g and er g . The partial ordering -< is the ordering on s. The 
right arrows " represents the sequential split for fi g and cr g . 
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these restrictions are, we denote 

(A.6) p£g = 2v kg p kg and p^ = 2(1 - v kg )p kg 

for k = 1, 2, ... , Kg, to be the split components from 2p kg . Note that depend- 
ing on the value of u kg = 0, 1 or 2, the split components, p kg and P kg , are 
either both assigned to component g2, one to g\ and the other to gi , or both 
to g\ . For the case u kg = 1 , we will assume that p kg is the split probability 

(2) 

that goes to g\ and p kg goes to 52- Note that the mixing probabilities for 
both components g\ and 52 should equal 1. This implies 

(a.7) £ pg+ E 2 ^ = 1 and E E 2 ^ = 1 

for components 51 and 52 j respectively. The second equation of (A.7) is 
redundant if the first is assumed since Ylk ■. feeAi Pkg + Y^k ■. keA 2 ^P k a + 

Efc.-fceAiPfcfl + Efc : feeA 2 ^9 = 2 EfiiPfc 9 = 2 - We rewrite the first equa- 
tion as 

(A.8) a k v kg = 1, 

where ak = Ipkgl (1 — Ylk- keA 2 ^Pkg)- Equation (A.8) implies that the entries 
of the vector v are required to satisfy two restrictions: (1) < Vk g < 1 for 
k = 1, 2, . . . , K g from (A.6), and (2) equation (A.8) above. In Dass and Li 
(2008), an algorithm is given to generate such a v where the proposal density 
can be written down in closed form. 

The split of fi g and a g is carried out by generating two new random 
vectors yk g and Zk g , for k = 1,2, ...,K g ; see Figure 6. The generation of 
y kg is subject to restrictions arising from constraint (A.l) on fj, g . The other 
component of the split of fi g and cr g , y kg and z kg , are obtained by solving two 
(vectorized) linear equations [see Dass and Li (2008)]. Our techical report 
also gives further details of the RJMCMC sampler, including obtaining the 
new first and second level labels as well as the deriving explicit expressions 
for the allocation probabilities and the Jacobian of the transformation from 
(0,u) to e*. 
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